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QUALITY-CONTROL _OF_ INFORMATION: 
On the concept of accuracy of information 
in Data-Banks and in Management Information 
Systems. 


This paper is intended to assist those who 
develop, use, maintain, audit, or in general 
may be affected by so-called Data-Banks and 
Management Information Systems. 


importance of accuracy, or more generally of 
quality of information. Data-Banks and Manage- 
ment Information Systems may typically imply 
some processing performed on externally 
obtained measurements and pre-processed inputs, 
while their outputs may be stored and used by 
people in unknown contexts, 


To the extent that this happens it becomes 
more difficult to expect that the quality of 
information can be represented by a measure of 
effectiveness of system and subsystems in 
welation to operational goals. Thus, 

a_second purpose of this paper is to suggest 
some possibilities of attaching a measure of 
quality to discrete items of information, such 
as coded observations and intermediate compu- 
tational results. 


The paper consists of five chapters supporting 
five sets of statements regarding the conse- 
quences of present practices,and what can be 
done to im plemént the most necessary improve- 
ments. Illustrative examples emphasize adminis- 
trative applications such as in public planning 
and in industrial manufacturing. 


KEY-WORDS 


Accuracy, Integrity, Privacy, Secrecy, Quality, 
EDP-Auditing, System-Management, Data-Management 
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FOREWORD 


I started the study reported in this paper with a 
feeling of curiosity and personal challenge origina- 
ted from particular problems experienced during my 
professional activity in industry. 


After many months of work I felt disapnointment and 
amazement for not being able to frame a scientific 
statement of the problem, and of course, much less 
a solution to it. The problem apparently "did not 
exist" according to the available literature and 
reports on current research, 


My fortuitous contact with the writings by C.W. 
Churchman initiated a period of deep satisfaction 
and allowed! me to organize my subsequent work with 
a feeling of being on the right way. 


I terminate this study in a fourth mood: strong 
apprehension,because of the implications of my 
conclusions, with respect to the possible social 
impact of information systems for public planning 
and administration. The same anplies with respect 
to the possible social impact of certain directions 
of current sociological and psychological research. 


I hope that I will be proved to have been wrong. 
In the meantime my strongest desire is to stimulate 
others to further study of these issues. ' 


I want to thank all the numerous peonvle who in many 
different ways helped and encouraged me to accomplish 
this work. An attempt to enumerate them would proba- 
bly result in neglecting unintentionally somebody. 


Therefore,I will explicitly thank only Boérje Langefors 
who first showed to me the need and possibility of a 
scientific systems thinking, and whose intellectual 
courage and open-mindedness made this work possible. 


Secondly, I want to acknowledge my intellectual debt 
to C. West Churchman whose writings opened my way 
towards a scientific and human understanding of the 
issues related to this study. 


March 1972 


Kristo Ivanov 

R. Almstroemsgatan 3 
S-113 36 Stockholm 
Sweden 
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INTRODUCTION 


The motivation to start this study originated from 
the results of an investigation led by the author 
at the time he had managerial responsibilities in 
the engineering department of a manufacturing plant. 


The investigation was directed towards the analysis 
of errors in the data-base describing the manufactu- 
red products. Many of the errors turned out to be 
other than the conventional "input" errors like 
transposition, substitution of digits, etc. As a mat- 
ter of fact we felt that many of these errors had at 
some time to be committed in order to keep the system 
going, and they should perhaps not be called "errors" 
in the conventional meaning of the word. A proper ap- 
preciation of their nature led us to the domains of 
systems design, integration, data identification, etc. 


This implies that our study is CRITICAL, that is,it 
presupposes that things are NOT going well in the 
area of systems design and operation. Thus, our expe- 
rience has determined the general orientation of our 
work and it has furnished rich unstructured empirical 
material which was not explicitly utilized in this 
paper. 


The graph on the next page gives an overview of the 
structure of this paper. Chapter 1 - based on 

our presuppositions, experience and observations re- 
sultsin summaries of what the conventional literatu- 
re on electronic data-processing (EDP) says about 
errors and quality of information. In a similar way, 
chanter 2 results in summaries of empirical quantita- 
tive results on error rates. With the assistance of 
some more theoretical and scientific literature, chap- 
ter 3 integrates the results of the two earlier chapn- 
ters suggesting the typical consequences and nature 

of a limited understanding of the error-quality issue 
as evidenced by the reviewed literature. 


Chapter 4 draws heavily upon scientific literature in 
order to allow a scientifically justified definition 
of some aspects of quality of information in a way 
that is consistent with the suggestions set forth in 
chapter 3. Finally, chapter 5 uses the newly defined 
aspects of quality, refines them, and evaluates them 
in the light of the earlier practical-empirical results 
of chapter 2. The chapter results in particular recom- 
mendations on how and where to concentrate the quali- 
ty effort of an organization, and may therefore be 
seen as the core of a "handbook for quality-control 

of information" assisting the designers and users of 

a data-bank or management information system. 


More detailed contents of this paper may be found in 
the previous list of "contents". 
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Infermation-precedence graph illustrating 
a rough overview of this paper in terms 
of relations between the presuppositions 
of this study and the conclusions from 
chapters 1 to 5. 


This paper contains also several apnendixes containing 
both material »vriginated by us and material written by 
other authors which was selected and sometimes heavily 
edited by us. This should be kept in mind when evalua- 
ting the material of others, since our editings are 
out of context and can never do full justice to the 
authors of the original text. 


Exact citations are always enclosed between quotation 
marks. Yoth the extensive citations and the apnendixes 
are judged by us as necessary for a proper understan- 
ding of this paper, which spans over a very wide ran- 
ge of professional literature, most of it not reacily 
available at minor locations. Our references to 

" (Casual ) Documents ", sometimes abbreviated "CD", 
refer to the corresponding items in apnendix Al. They 
originate from personal notes that we wrote in the cour- 
se of the years, based on literature which we are not 
able to identify. We included them because they are 
valuable as testimony of thinking found in the business 
and administrative community. 


A discussion of the method for our work is presented 

in appendix A1l2. We feel that the full implications of 
the discussion are better realized after having read 

the main body of this paper. At this time it will suffi- 
ce to remark that we did not judge convenient to 

attempt the use of a precise terminology. Our under- 
standing of what is meant by information systems corres- 
ponds to the ideas set forth in Sweden by B. Langefors 
and in the USA e.g. by C.W.Churchman: information is 
used for decisions. For the rest, the reader should 

not assign any particular importance to the shifting 

use of terms except for what may be inferred from the 
context: the meaning of used words will emerge in the 
course of the arguments in the paper. 


For example, we use alternatively the words Data-Sanks, 
Information Systems, Management Information Systems; 
Accuracy, Quality; Model, Theory; Measurement, Obser- 
vation; Administration, Organization, etc. 


More explicit statement in the text are emphasized with 
the mark D> at the left hand margin. Such statements 
are often the basis for the specific conclusions in 

the corresvonding chapter. 


ON ACCURACY 


There is one concept in the litterature on elec- 
tronic data-processing which appears to be of 
fundamental importance, especially in the context 
of information systems for administrative appli- 
cations. 


It is the concept of ACCURACY. 


We say that it appears to be of fundamental impor- 
tance because the word is found whenever somebody 
wants to declare the importance or value of a 
so-called information system to be developed, as 
well as of such a system that is already installed 
and operational. Furthermore, the word is also 
found in the context of emphasizing the importance 
of correct input to an already developed and 
installed system. 


In order to determine the desirability of further 
research on the nature of accuracy, a review was 
made of the professional literature dealing speci- 
fically with electronic data-processing. The 
review included books, periodicals, research re- 
ports, instructional booklets of computer manufac-— 
turers and internal company reports from places 

to which the author had access in the course of 
his professional activity. 


No intentional "a priori" selection was made of 
which literature out of the above would be more 
closely examined. Through browsing the focus of 
attention was put on those publications that had 
something stated about the nature of accuracy or 
about concept intuitively related with the accuracy 
issue. 


ON ACCURACY AND QUALITY 


Appendix Al displays an edited selection of such 

a review with a view towards answering the question 
"what is accuracy ?", and “is it in some sense 
important - justifying further research ?". 


The appendix was created for the convenience of 
the readers, bringing together some material that 
was spread out in many different sources. The text 
had to be taken out of context and edited, which 
should be kept in mind because of the danger of 
misunderstandings and of not doing justice to the 
authors of the original text. 
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Consideration of appendix Al introduces a multitude 


of new concepts intuitively related to accuracy. Trey 
are listed here below. We have completed the list with 
those terms which are known from other occasions, in- 


cluding those which denote aspects intentionally ox- 
cluded from our main study, like security. 


Accuracy Usefulness Trueness 
Value Confidenti :lity Relevance 
Validity Consistency Reasonablouess 
Dependability Authenticity Pertinence 
Integrity Completeness Acceptability 
Correctness Reliability Refinemeat 
Precision Degree of Detaii Aporeximation 
Timeliness Recency Currency 
Freedom from Error Controllability Rightness 
Exactness Goodness Accessibility 
Quality Availability Security 
Secrecy Privacy Coverage 


For the purpose of further reference in this paper, 
we will often choose the word SUALITY for representing 
roughly the set of all above words, In this sense, 
Quality stands for a generic attribute of information. 


ATTRIBUTES OF INFORMATION 


A closer analysis of the material in the appendix Al 
may be performed in attempting to answer the following 
questions. 


1. Is the particular concept defined ? 


2. Is any justification given on whether it is, in 
some sense, important ? 


3. Are any recommendations given about what can be 
done in order to improve the quality ? 


Out of the about twenty sources in appendix Al, less 
than ten appear as having attempted to define quality. 
The attempts appear done in terms of conceptual rather 
than operational or functional definitions: i.c. the 
definition relates the concept being defined to once 

or more other concepts and generally takes the form 
similar to that of dictionary definitions. 


For instance, Carr (1970) apparently equates RELIARTI- 
LITY with CONSISTENCY. Lauren (1970) suggests that 
RELIABILITY is the same as AUCUKACY. On the other 
hand I8M (F20-0006) suggests that RELIABILITY ana 
ACCURACY are two distinct concepts. 
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Orlicky (1969) implies that QUALITY is FREEDOM FROM 
ERROR but he does not offer a further definition of 
error. Since INTEGRITY is mentioned by him as freedom 
of error, completeness, and timeliness, one could con- 
clude that quality is only one of the aspects of inte- 
grity. 


Rodin (1971) relates quality to the concept of IDEAL. 
The ideal value corresponds to the COMPLETELY EXACT 
AND CURRENT value. Since he defines quality also by 
its components COMPLETENESS, PRECISION, CORRECTNESS, 
AND CURRENCY, we could conclude that his concept of 
exactness is equivalent to a synthesis of the three 
concepts of precision, completeness, and correctness. 


Montelius et al. (1970), Rodin, and a Casual 
Document (1964) make use of statistical terms such as 
RANDOM, ERROR LIMITS, and STANDAKD DEVIATION, They do 
not, however, develop the meaning of these terms in 
the particular application. Since such words refer to 
very elusive and misused ideas, their use by the au- 
thors should be submitted to a critical evaluation. 
It should have been necessary to have, for instance, 
a reference to scientific-statistical literature or 

a closer snecification on how to obtain the relevant 
observations. 


Blumenthal (1969) in a book wholly dedicated to plan- 
ning and development of management information systems 
does not make any reference to the problem of quality, 
or we were not able to find any such reference, unless 
it is considered as implied by a successful design, 
Quality is not included neither in the input data de- 
finition nor in the analysis of user requirements. 

The author apparently considers quality specification 
of data as a meaningless question since data are by 
him defined as "uninterpreted raw statements of facts". 


Carr (1970) implies that legal and administrative 
applications of data are not decision making and that 
their data requirements do not generate data with good 
quality. From his formulation one is led to think that 
bad quality in terms of observation errors results 

to a large extent from the implied applications of 
such data. In spite of the vagueness of the statements 
this suggests important objections against Blumenthal's 
conceptualization of DATA, without however assisting 
in the definition of the terms. 


J.C, Emery makes quality dependent on ACCURACY. Accu- 
racy is seen as a QUALITATIVE characteristic of infor- 
mation which attempts to substitute the quantitative 
estimates of information value at lower levels of de- 
cision-making. Emery seems to imply that at high le- 
vels of decision-making neither accuracy nor informa- 
tion value can guide design decisions for development 
of information systems. The author apparently differen- 
tiates between accuracy of input data and RYFINEMENT 
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of the estimates of input variables that are critical 
in determining payoff. For the former, PERFECTION or 
absolute accuracy is only a question of costs and not 
limited by the nature of human knowledge; this is ap- 
parently what Emery implies. For the latter he sug- 


REFINEMENT of estimates, that is, improve accuracy 
which by the way may also be limited by the INHERENT 
STATISTICAL VARTABILITY of reality. Emery, however, 
does not define accuracy, errors and other used terms. 


W.Edwards et al. propose that quality of information 
be substituted by quantity but,as far as we could see; 
do not define quality. This is particularly troubling 
when one knows that there are cases in which the pro- 
posed Bayesian probabilistic models are being used in 
military information systems. It is legitimate to won- 
der what do the assurances mean that, for instance, 

a nuclear attack cannot be triggered BY MISTAKE or 

BY ERROR} 


Sundgren & Lundin do not either define quality but 
they attempt to consider it as one among other goals 
of a public data-bank, and then they proceed showing 
implicitly its nature by means of its relationships 
to the other goals. The authors, however, do not jus- 
tify their allocation of the quality goal to the go- 
vernnent: it could be conceived as being originated 
also by the citizen or by the organizations. 


Montelius et al. (1970) state that the input elements 
must be regarded as NEUTRAL from the VIEWPOINT of the 
information process, where the process is chosen on 
the basis of experience and error-controls will be 
based on CHALLENGING in some way the the PRESCRIBED 
STANDARD PROCESSES. The authors, however, do not deve- 
lop the ideas of neutral, control of standards etc. 
Therefore,their definition of error is also indeter- 
mined, vague. 


Owsowitz & Sweetland (1965) in spite of adopting an 
ambitious aporoach in terms of PREVENTION of errors, 
apparently consider it possible to limit their study 
on INPUT errors and disregard the correctness of the 
information processes. Their definitions of accuracy, 
validity, consistency are vague in the sense that 

for instance they do not explicitly state what proce- 
dures should be followed in practice,to determine the 
validity of a recording mechanism. 


Vagueness and circularity of definitions is, in our 
opinion also characteristic of Weinmeister's apnroach 
(1971) and also in N.P.@dwards' approach,.The latter, 
for instance refers to the ACCURACY of a cost estimate, 
ACCURACY of the command and control system and of its 
subsystems, ACCURACY of the raw data, ACCURACY of the 
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value of timeliness, accuracy, reliability, 
ACCURACY of the knowledge of the exact present 
location of the target, and ACCURACY and age-qua- 
lity of the knowledge of the target's last posi- 
tion. 


ON THE IMPORTANCE OF QUALITY 


Out of the reviewed literature, the Casual 
Document ( Casual - Documents will be referred 
to,by the abbreviation "CD") from 1966 states, for 
example that accuracy is the fundamental objecti- 
ve of information systems. 


IBM (F20-0006) states accurate processing of data 
means that the processing, besides of being perfor- 
med without undetected errors and in accordance 
with management's policies and instructions, FULLY 
ACCOMPLISHES ITS PURPOSES, 


CD (1964) seems to imply that accuracy as well 
as other attributes of information such as time- 
liness and dependability,is a component of its 
value. 


The above thoughts lead us to the more general 
and interesting matter of the relation between 
the quality of information; its value and the 
goals of a system. Emery touches this by stating 
that it is our inability to make quantitative 
estimates of information value,that forces us to 
use the concept of quality in developing organi- 
zational information systems. 


It is apparent that from these points of view, 

the quality of information is of fundamental im- 
portance for information systems. This statement 
is made even more interesting by the possibility 
that the value-impact, or more specifically the 
economic impact, of quality problems may rapidly 
increase because of the proliferation of so-called 
data-banks and management information systems. 


Especially to the extent that the sources of 
information are not the same as the users of 
such information after its processing by some 
system, and to the extent that the user or affec- 
ted population itself cannot be limited and defi- 
ned, no "data-management" will be possible. The 
impact of the quality problem may have serious 
consequences: this may turn out to be the case 
with many public information systems unless some 
scientific control is established proving the 
contrary. 
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Physics, as a science, enjoys high status and 
reputation. As an illustration,the importance 

of quality of information may also be appreciated 
by referring to the issue as it appears in the 
physical science's information system: 


Assume an engineer retrieving from a data bank 
some technical data to be used in the construc-— 
tion of a bridge: if he gets for instance the 
tensile strength of a certain kind of steel,with- 
out any indication on the accuracy and precision 
of the figure, he will not be able to use such 
steel in his work. Or alternatively, if he uses 
the steel anyway, say nine out of every ten brid- 
ges he builds will prove to not bear the load for 
which they were designed ! 


Thus it is apparent that e.g. in "general" data- 
banks, the quality of information cannot be assu- 
med to be less important. We would rather say 
that,unless soniebody proves the contrary, the 
quality of general, organizational or social, 
information is still more important that in the 
physical sciences since the weaker theory building 
prevents testing the consequences of the use of 
information with inadequate quality. It is diffi- 
cult to show the collapse of a social or business 
"bridge" and to put it in relation to its cause 
or "steel". Nor can weaker theory be compensated 
always through more "pure or raw facts" or direct 
observations: a country's unemployment figures 
stored in a public data bank are not more direct 
or basic facts than the physical properties of 
steel, stored in a technical data-bank,. 


There are indications that quality is in bad 

shape even in the physical sciences: Branscomb 
(1968) now director at the USA's National Bureau 
of Standards makes this very clear when at the 
same time giving a hint about the importance of 
the issue. He refers to research on a particular 
physical problem, cross sections for electron 
collisions, and he suggests a method for saving 

a substantial part of 44 million dollars in the 
course of a four-year period: "Simply by not doing 
the work at all unless it is written up in such 

a way that it can be evaluated and therefore 
become useful" (1968). If applied to data banks 
and information systems the same statement would 
read: "Do not generate or store information for 
information systems at all unless its quality is 
specified in such a way that it can be evaluated". 


Bet 


In spite of its importance, then, the quality 
problem is not properly understood or is ignored 
in the context of well established sciences, 


Also Hisenhart (1968) and Hallert (1968, 1970) 
show through their attempts to explain quality 

to natural scientists and technicians, that such 
explanations are badly needed in broad areas out- 
side of our immediate concern with ADMINISTRATIVE 
data-banks and information systems. Their emphasis, 
however, is directly relevant to design of data- 
banks containing, for instance, information about 
physical quantities. Since much of the exverimen- 
tal and theoretical work in so-called ASTIFICTAL 
INTELLIGENCE and fact-answering or fact-deducing 
systems is aimed initially at the simpler and 
better known physical reality, one may wonder 
whether such projects make allowance for storing 
and processing quality specifications. 


It comes, therefore, eventually as quite naturally 
to learn that the situation is much worse in natio- 
nal and business economic statistics. This comes 
very ciose to the emphasis which we have given to 
this study. It may be only of question of time 
before in all industrialized western countries 

such economic statistics is regarded as information 
processing of "facts" stored in public data-banks 
and information systems. A whole book by 0. Morgen- 
stern "On the Accuracy of Economic Observations'!1963, 
may be regarded as a qualified massive document on 
the immense importance of the quality of informa- 
tion. 
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SOME COMMENTS ON THE CONTENTS OF 
THIS CHAPTER. SUMMARY 


Tn general, the above kind of reasoning is what we 
think that can be accomplished from an analysis of 
the EDP literature and its definitions regarded as 
CONCEPTUAL definitions, sometimes also called consti- 
tutive or contextual. It is apparent that there is 

no agrecment among the various authors: each one of 
them brings his own particular experience and intui- 
tion without framing his ideas in basic consideration 
of scientific method, 


Ut is difficult to see that a further analysis along 
the same lines as above would be fruitful for our 
purposes. We could go on showing that some literature, 
like EDP Analyzer (Feb.1968) goes about by listing 
major causes of poor data, other like Casu- 

al - Document (1970) or the auditing literature re- 
presented by I8M (F20-0006) and Davis (1968) just 
propose what should be done to improve quality in terms 
of detailed EDP validation techniques or principles of 
organization, The implied scope of quality thinking 
ranges from trivial keypunching errors to the almost 
"everything" of the broad and vague concept of DATA- 
MANAGEMENT. One wonders how such an ambitious and 
vague data-management as suggested by Casual 
Document (1970) can be enforced on an universal social 
basis for the purposes of public data-banks ! 


“Self-evident" truths turn out to be no self-evident 
at all. For example the elimination of the human ele- 
ment from the input data stream is often assumed to 
result in better accuracy of the input. This is sugges- 
ted, for example, by Slumenthal (1969, p.175) and 

by J.C. Emery (1969,p.38). J.P.McNerney (1961), on the 
other hand, in a very well justified and interesting 
study suggests that the opposite may be indeed true 

in certain circumstances. How to define "the circum- 
stances" ? C.W.Churchman (1968b,p.189) suggests some 
of the deep implications of this issue: "objectivity" 
obtained by putting more and more of the act of obser- 
vation into hardware such as computers and physical 
instruments greatly limits what can be observed,to the 
realm of PHYSICAL reality, 


After a review of the EDP literature we find ourselves 
in a really bad shape. Nowhere is told us how to mea- 
sure quality and for what purposes, in an explicit 
manner. We are not able to use the implicit definitions 
in their present form as a basis for binding negotin- 
tions on desired and committed quality levels between 
a "buyer" and a "seller" of information, To the ex- 
tent that the authors offer recommendations on what 
should be done in order to improve quality, we do not 
know why we should place confidence in their advice; 
and even if we placed confidence and implemented their 
advice we would not be able to evaluate the results 

of their recommendations. 


We state, therefore, that the available EDP literatu- 
re does not define QUALITY OF INFORMATION,in the sense 
that it does not explicitly sunport the formulation 

of operational definitions of the concept. The review 
gives at best some vind of insight: there appears to 
be some consistency among the authors in identifving 

a TIME-RELATED aspect of quality that goes under the 
denominations of timeliness, recency, currency; other 
aspects are not explicitly time-related. Furthermore 
it appears that quality may be either associated with 
the information itself or with the system generating 
such information, We are not capable, however,to use 
these insights in their present form. 


In face of the discouraging results of our review, 

we turn to the literature on scientific method in or- 
der to see what is said about definitions and opera- 
tional definitions. 


In the context of discussing what the CONT#NT of con- 
ceptual and operational definitions in science should 
be Ackoff (1962,p.146) states: "In the newer branches 
of science, in particular, it has become increasingly 
common to define one concept in terms of others which, 
if anything, are less well understood than the one 
being defined and whose onerational significance is 
even more obscure." Later, (p.150), Ackoff suggests 
five instructions for the build-up of definitions, 
which we shall roughly follow in the spirit of this 
paper. The basic idea, as we see it, is that defini- 
tions cannot be created out of thin air; they must he 
anchored in some established scientific knowledge, theo 
ries. 


As Churchman (1948, p.159) summarizes it: "traditional 
empiricism has misread the significance of conceptions 
or general ideas; it has connected them with expericn- 
ce of the actual world; it has connected the origin - 
and validity of general ideas with antecedent expe- 
rience. According to it, concepts are formed by com- 
paring particular objects, already perceived, with one 
another, and then eliminating the elements in which 
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they disagree and retaining that which they have in 
common. Concepts are thus simply memoranda of identi- 
cal features in objects already perceived" (cited 
from J,Dewey's "The Quest for certainty"- 1929). 
Traditional empiricism has thus failed to realize the 
important role of generalizations; its"ideas are dead, 
incapable of performing a regulative office in new 
situations." (same source). 


Continuing his integrating discussion of ompiricism 
versus rationalism Churchman continues citing Dewey: 
"the basic error of traditional theories of knowled- 
ge resides in the isolation and fixation of some pha- 
se of the whole process of inquiry in resolving pro- 
blematic situations. Sometimes sense-data are taken; 
sometimes; conceptions; sometimes, objects previously 
known. An episode in a series of operational acts is 
fastened upon, and then in its isolation and conse- 
quent fragmentary character is made the foundation oF 
a theory of knowledge", 


We think that no comments are necessary except for 
putting the question whether what we witness in the 
EDP literature is a variant of traditional empiricism 
or positivism,cxtensively criticized by Churchman 
(1968b). If this is true then we have a basis for ex- 
plaining why we felt that we come nowhere up to now, 
and a basis for expecting that a "practical" approach 
as attempted in the next chapter will also raise pro- 
blems of interpretation and generalization. We termi- 
nate this chapter by consolidating earlier statements 
in this chapter into the following. 


CONCLUSIONS FKOM THIS CHAPTER 


Ll. The reviewed EDP literature does not offer defini- 
tions of quality of information, in the sense that 
no explicit support is found for the formulation of 
operational definitions of the concept, 


2. The quality of information is of fundamental im- 
portance for the development and use of data banks 
or information systems; this is the oninion implied 
in the available EDP literature and it is also im- 
plied by the lack of a scientifically justificd me- 
thod for cost-benefit analysis of data-banks and 
information systems. 


This motivates an extension of our study into the next 
chapter. We will attempt there to bypass the theoreti- 
cal issues by inferring on quality from what has been 
and is practically done. 


WANTED: A PRACTICAL, REALISTIC, EMPIRICAL APPROACH 


The statements, good advices,"theories" and defini- 
tions found in the previously reviewed EDP literatu- 
re were shown to be based on shaky scientific foun- 
dations. However they presumably have originated 
from human experience with concrete problems. 

After all, everybody will agree that there are "err- 
ors" in the inputs to an EDP system, eigs a wrong 
address of a customer, wrong quantity to be shipped 
etc. 


The EDP practitioner may, therefore, ina specific 
situation ask for advices or investigations on how 
to improve the accuracy of inputs to the system. 
Something HAS TO BE- DONE and CAN BE DONE, even wi- 
thout "understanding" the whole issue or being able 
to define what errors are: 


In the context of otir research it is therefore 
tempting to hop a plane atid invade some business 
fith having accuracy problems with some instalied 
information system. We can take an army of statisti-+ 
cians with us, who will gather lots of hard data 

on the problem, talk with the people who developed 
and use the system, and finally apply statistical 
techniques and common sense to the data in order to 
suggest improvements. The object of investigation 
could be the accuracy of card punching and verifica- 
tion. In more sophisticated installations the object 
could be the accuracy of procedures leading to the 
keying of input data into on-line direct entry ter- 
minals etc, 


IT TURNS OUT THAT MANY SUCH INVESTIGATIONS HAVE 
ALREADY BEEN DONE. The results are however spread 
out in publications ranging from the subject of 

EDP to applied psychology and human factors. We have 
made a review of such literature which may be rele- 
vant to our purposes and an overview is presented 
in appendix A2 for the convenience of the readers, 


If the literature shows in some sense reliable and 
valuable material, we will be able to consolidate it 
obtaining a set of guidelines for improving the 
quality of information, obtaining implicitly-at least- 
some theoretical understanding of the quality issue, 
and in any case concluding about the desirability 

and nature of further study of the quality issue. 


LITERATURE WITH EMPIRICAL QUANTITATIVE RESULTS 


The basic selection criterium for the literature 
reviewed in appendix A2 was that something should 
be stated on specific ERROR RATES in the context of 
information. This would hopefully take us to some 
implicit concept of ERROR and of QUALITY. Further- 
more we vaguely expect, departing from the familiar 
context of quality control in industrial manufactu- 
ring, that we might establish some "normal" error 
rates which will assist us in the search of methods 
for decreasing such rates. 


The appendix consists of edited selections from the 
referenced papers. The selection was made with em- 
phasis on the ERROR RATES rathet thah on abstracting 
the whole paper. Although not always consistent, 

we attempted to keep our own comments and heavier 
editings aligried at the most left hand side of the 
page: To the extent that the autHors applied advan- 
ced statistical techniques, our comments do not im- 
ply that we have critically analyzed the calculations 
and found them to be correct. 


Since the edited text is taken out of context, no 
guarantee can be given that we make justice to the 
authors: the readers must refer to the given sources 
in order to evaluate the papers. 


The review reached beyond the area of literature on 
EDP, including more general and scientific literatu- 
re from such areas as theoretical analysis of infor- 
mation systems, applied psychology, ergonomics and 
human factors, statistical journals and research 

in education. As a self-imposed limitation to the 
scope of our work we have not included the area of 
statistics applied to censuses, surveys, validity 
and reliability of psychological tests etc. 

We will later attempt to show that this does not 
detract from the conclusions of this chapter. 


The reviewed papers and our overview may be appre- 
ciated in terms of e.g. 


- the reference to quantitatively specified error 
rates (the basic necessary condition for being 
considered in the review) 

- the level of ambition, ranging from keypunch errors 
as in Biirotechnische Sammlung (1956) to the consi- 
deration of subtle environmental influences as for 
instance in Smith (1966) 

- the depth of the eventual theoretical approach, 
related to the level of ambition above and to the 
attempt to classify errors, discussing their na- 
ture, as in Langefors (1968a), Smith (1966), Root 
& Sadacca (1967), Owsowitz & Sweetland (1965). 

To the extent that such theoretical approach is 
found in EDP literature, it could be included in 
appendix Al, as we in fact did with the Owsowitz & 
Sweetland's discussion of approaches to error. 


- originality of the approach, in considering in- 
fluences which were ignored by most other reviewed 
investigations, e.gi the Berglund & Larson's study 
of punched card layout, Smith's or Root & Sadacca's 
study of so-called content or omission errors. 
Another aspect of originality of approach may be 
the use of original methods in detecting or correc- 
ting errors, as for instance the development of 
predicting routines by Carlson (1963) based on the 
decision-tree heuristics suggested by Newell, Shaw 
and Simon. 

- generality of the approach, in covering many possi- 
ble aspects of the error or quality problem, as 
done by Smith (1966) or by EDP Analyzer (1971a, 
1971b). EDP Analyzer, however,obtains generality 
thanks to its overview approach, mostly referring 
to relevant sources of literature. 

- clarity in the explanation of used concepts or 
performed investigations, preventing ambiguity in 
the mind of the reader. An excellent example of 
desirable clarity is given in the Berglund & Larson 
paper. 


All the above modes of appreciation were determinant 
of the selective abstracting in appendix A2. 


WHAT DOES THE 
QUANTITATIVE LITERATURE CONTAIN ? 


Many of the reports result from the application of 
statistical methods. 


Variables are generally related to 

- types of entry devices, types of keyboards 

- use of punch verification, check digits etc. 

- skills or experience of operators at entry devices 

- grouping, length, composition (alpha content, etc.) 
of messages, punched card layout etc. 

- aural versus visual presentation of stimulus (ori- 
ginal) 

- rate of presentation of stimulus or time-pressure 
on entry 

- use of mnemonic codes or letter-pattern familiar 
codes 

- management or supervision emphasis on accuracy or 
speed of entry 

- allocation of entry functions between the creator 
of source document and operator of entry device 

- use of pre-assigned media such as pre-punched cards, 
badges for personal identification or identifica- 
tion of remote terminal. 
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Performance of the entry process or of the handling 
of information is generally expressed in terms of 
ERROR RATES which relate to the degree of identity 
between stimulus or original message and the output 
from the human subject or from the entry device 
activated the the human operator. Sometimes the 
check of identity is extended to the output from 
some editing routines in the computer system. 


Whenever the nature of the information handling 
process prevents a simple one-to-one correspondence 
between input and output, new performance measures 
are proposed either in terms of communication the- 
ory applied e.g. by Vah Gigch to models of "integra- 
tive behaviour", or in terms of especially develo- 
ped error-classification schemes as by Berglund & 
Larson. 


Smith offers an interesting list of alternative 

criteria for data collection performance: 

- time per entry, would be meaningful only in those 
cases when a substantial portion of the subject's 
time is devoted to the entry process 

- rate of information flow (as proposed by e.g. 
Cardozo & Leopold and by Van Gigch) has no frame 
of reference for inclusion of omitted or incomple- 
te messages, but it is interesting for its combi- 
ning of speed and accuracy in one measure. 

- number of consecutive good entries between mista- 
kes would be of no practical utility because, Smith 
says, the computer system normally has to analyze 
all input messages. Martin's and Norman's discus- 
sion of accuracy in communication networks and 
Langefors' reference to the importance of many 
small transactions for the impact of errors on 
administrative EDP systems, however, suggest that 
such measure may be meaningful in some respects. 

- ratio of volume and time of supervisory (adminis- 
trative)messages to system input messages is said 
to be too dependent on many environmental charac- 
teristics. It is however interesting since it seems 
to imply the important concept of error-correction, 


Smith finally chooses the PERCENTAGE OF INACCURATE 
OR INAPPROPRIATE ENTRIES as the most UNIVERSAL CRITE- 
RIUM OF DATA COLLECTION PERFORMANCE. 


QUESTIONS THAT ARE RAISED BY THE LITERATURE 


While reviewing the literature, several questions 

are raised beyond the above discussion of the meaning 
of performance and error rates. The questions reside 
in how to compare and use error rates in face of 
differences and ambiguities in the nature of the 
reported figures. 


Error rates are either in terms of errorless entries 
(i.e. all entries except those with AT LEAST ONE 
error) , where entries consist of different amounts 
of symbols (message lengths), or they may be also 
expressed in terms of individual symbol errors. 
Symbols for message syntax (such as field separation 
or field and record identification) may or may not 
be included in the error statistics. 


Error figures may include errors that were detected 
and possibly corrected by the operator himself du- 
ring the entry process, but such figures may also 
refer only the undetected or residual errors. 
Uncorrectable errors may designate the same thing 

as so-called residual errors, i.e. those errors 
which were tlot corrected at the last step before 
entry into system computation, Uncorrectable errors, 
however, sometimes designate those errors which are 
detected by checks at the entry device but are amena- 
ble to error in the source document: the error is 
not caused by the operator and therefore is not 
correctable by him without heavy loss of so-called 
efficiency in the entry process. 


Error rates after detection and correction by opera- 
tor himself at entry, should not be equated to error 
rates at input to the computer editing and validation 
routines since sometimes entry verification (e.g. 
punch verification) is done by another operator in 

an independent entry procedure, and/or by verfica- 
tion-validation checks by software incorporated to 
the entry device. 


Comparison of error rates for messages of different 
lengths is furthermore complicated by the use of e.g. 
prepunched sections in the messages and by many 
ambiguities in the terminology. DIGITS may commonly 
denote arabic numerals but sometimes they are used 
in expressions like "l0-digit numeric data words" 

in which case the term is understood to be used also 
for denoting alphabetic characters. In such case is 
"digit" equivalent to ALPHA-NUMERIC character or 
SYMBOL, but "symbol" may rather be used to include 
special signs and letters from foreign languages, 

not belonging to a particular alphabet, LETTER is 
often used as synonym to CHARACTER. Finally one 
meets ambiguities in the.meaning of terms like DATA 
which may stand for all the previous concepts of 
digit, character, symbol etc., but also for CODE, 
MESSAGE, ITEM, and in general the ENTRY'S DATA REPRE- 
SENTATION. 


The most serious difficulties of interpretation of 
results, in the sense of being able to compare and 
use the reported error rates, however, stem from 
the environment in which they were obtained. It may 
be e.g. FIELD or LABORATORY. If field, it can still 
be field trial-or field experiment, and field opera- 
tional (as in Kramer, 1970). If laboratory, it may 
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still simulate field inputs (as in Root & Sadacca, 
1967). Eventually some results are a flat statement 
of experience, presumably based on field or laborato- 
ry reports (as in Orlicky, 1969). Carlson (1963) pre- 
sents a study of historical field data. 

Such different environmental conditions may explain 
the appearance of error rates such as percentaces 

in terms of types of inputs (e.g. percent of messa- 
ges of 10-digits length which were in error) or ia 
terms of persons (e.g. percent of entries made by 
subject A,which contained at least one error), 


Different environmental settings imply also riany 
special handlings and exceptions in the processing 
of original error information: for instance sometimes 
errors in the "cents" positions of dollar amounts 
were not counted as errors - the same happening to 
those original errors which could be ascribed to 

have been caused by poor handwriting on the ovigi- 
nal source document. In other cases some symbols 
were not used which could be visually or auralily 
confused with other symbols (e.g. M can be aurally 
confused with N). In one case the investigators 
report that they did not count as errors those whic 
would conceivably have been prevented or detected 
in an operational field environment, by means e.;. 
of better training or programmed validity checks, 





WHAT CAN BE STATED ON THE BASIS OF THE RESULTS 7 


It is apparent that any advices based on tho reviewed 
literature must be qualified by "if", "possibly" 
etc., including recommendation of careful evaluation 
of the original literature. 


At the level of general advice we could gather 
guidelines like the following: 


1. Errors increase as the number of characters in 
the data code (code lengtt ) increases. Longer 
codes should be avoided, if not possible, they 
should be devide@ in smaller units of three or 
four characters, e.g, 123-4567 instead of 123567. 

2. The characters used in data codes should avoid 
digits or letters that can easily be confused 
with each other, such as I versus 1, 2 versus 
slash (/) or virgule (,) versus number 1, lett 
O versus Q, O versus 6, U versus V, 

3. Nonsignificant codes should avoid characters that 
when pronounced sound alike, such as M versus N, 
B versus P, 

4, Significant or meaningful data codes are prefer- 
red over non-significant since this fact Litates 
recall by the human coder and reduces errors. 
For example M and F are expected to be more re~- 
liable for MALE and FEMALE than 1 and 2. 
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In the cases when the code is structured of both 
alpha and numeric characters, similar types of 
characters (alpha, respectively numeric) should 
be grouped and not dispersed throughout the code. 
For example, fewer errors eccur in a three cha- 
racter code where the structure is alpha-alpha- 
numeric (e.g. HW5) thah alpha-numeric-alpha 

(e.g. H5W), 

When designing a code number system, try to avoid 
the chance of double occurrences of a character. 
Repeating characters are a major source of trans- 
cription error: the chance of error is greater 

in transcribing 31146 than it is when transcri- 
bing 31046. 

Use check digits whenever possible and appropria- 
te. 

Avoid the use of variable length, fixed order 
punch card layout unless the higher probability 
of errors are dffset by other advantages, 

In the design of number check routines in verifi- 
cation consider that most digit manipulation er- 
rors are caused by single digit substitution, 
followed by omissions, 

In general, use sight verification when data is 
of language type, i.e, in terms of words and 
phrases, and key verification when the data must 
be compared on a charactet-by~character basis. 
Consider that there are limits to the accuracy 

of human sight-verification capability: the lower 
the frequency of errors to be detected, the less 
percentage of them will be in fact detected by 
the human sight verifier. 

In selecting punch machine operators, consider 
that the fastest operators are also those who tend 
to make the less mistakes. (In addition there 

are psychological testsfor selecting such person- 
nel). 

Easy correction of operator mistakes at entry 
devices tends to enhance both the speed and accu- 
racy of input. The same is true for easy detection 
in terms e.g. of answer-back tones at direct en- 
try devices. 

Confirmatory answer-back tones should not be too 
long since they can lead to other kinds of erra- 
tic behaviour by operators who get impatient. 

The profitability of punch verification should 
be continuously questioned since it deletes a 
very limited propwtion of punch errors. 

Consider source errors, sometimes called content, 
event, omission, procedural, misidentification, 
miscount,etc., generally more important in per- 
cent and seriousness of consequences than other 
entry operator errors and hardware or communica- 
tion-links errors. 

No preference, in general,can be stated for the 
use of alpha or numeric codes in a particular 
system. 
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No preference, in general, can be stated on whe- 
ther the person making the data-entry as operator 
of an entry device should be the same as the 
person creating the original source information, 
No statement, in general, can be done about the 
effect of using pre-assigned media such as pre- 
punched cards on the accuracy of input. 

Coding errors can be reduced at the entry stage 

by providing keypunch operators with knowledge 

on the set of the possible codes. This effect is 
greatest with menemonic codes. 

There seems to be a substantial advantage in 
accuracy by copying a code by hand immediately 
beneath the original. Forms, dockets,etc., should 
be designed in such a way that this is possible. 
Ten-key keyboards yield a significantly lower 
error rate and are preferred by operators, compa- 
red to other devices such as levers, matrix key- 
boards, rotary knobs and telephone sets. 

Speed of human sight-ckeck of errors is highest 
for groupings of 3 to 4 digits in numeric mate- 
rial, and it is inversely related to the frequen- 
cy of errors to be detected. The percentage of 
undetected errors increases with the higher speed 
of checking but it is not influenced by variation 
in grouping, 

For several tasks including keyboard entry and 
telephone dialing, grouping of digits by 3's and 
4's is consistently best in speed with no tenden- 
cy to differences in error rate, Users often state 
preferences for larger groups than those producing 
best performance. 

For codes of a given length (number of characters) 
coding errors tend to be proportional to the alpha 
content. 

It cannot be stated that the use of mnemonic codes 
reduces coding errors. However, letter-pattern 
familiarity affects coding errors: codes contai- 
ning letter pairs in familiar sequences (e.g. AT, 
BY, OK) have lower error rates. (Example of mnemo- 
nic code: OVH for "“overhead"), 

Time pressure on making data entries does not need 
to affect the rate of initial original errors of 
entry, but it may contribute to higher rates of 
residual errors by affecting the rate of both 
detection and correction of mistakes by the ope- 
rator at the point of entry. 

The rate of correct information that is retrieva- 
ble from coded information depends not only upon 
the error rate of the coding process but also 
upon the detectability of errors. This latter con- 
cept includes consideration of the ratio of the 
number of codes used to the total number of possi- 
ble codes which may obtained from all combinations 
of the allowable character set, 
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COMMENTS ON THE STATEMENTS OBTAINED FROM 
THE REVIEWED LITERATURE 


The guidelines which were suggested in the previous 
section appear much more useful than any speculations 
about proper defihitions of quality, accuracy etc. 
However this does not imply that we have bypassed 

the conceptual difficulties of the quality problem, 
Maybe the guidelines cannot be applied to the parti- 
cular installation for which one wants to improve 
accuracy. Maybe they are riot so useful as we wish. 


Agreement among different authors terminates at a 
very low level of ambition indeed. For instance, 
Biirotechnische Sammlung (1956) , Carlson (1963), and 
Smith (1966) agree quite well on such a simple matter 
as the proportion of digit maniptilation errors which 
may be expected to consist of single digit substitu-— 
tion, say more than 60 %. But concerning omissions, 
Biirotechnische Sammlung gives the figure 7 % while 
the other two give about 20 %. Conrad & Hull (1967) 
on their part show with their wide variation of per- 
cent figures that they require mcuh closer analysis 
for appro priate interpretation. 


Wright (1952) suggests that 0,3,6 and 7 are those 
digits which lead to most unreadable and ambiguous 
readings (combined) . Owsowitz and Sweetland (1965) 
suggest instead that 2 is the most incorrectly re- 
produced digit. Upon closer analysis it will be found 
that Owsowitz and Sweetland included even letters 

in their investigation, leading to the 2 being very 
often confused with the letter Z, and this explains 
the differences between the two findings. 


Concerning the use of either alpha or numeric charac- 
ters in the construction of codes, EDP Analyzer 
(1971b) refers to Davidson who advises the use of 
numeric codes only. On the other hand Conrad & Hull 
(1967) in the context of manual copying of codes, 
state that the conclusion that digit codes are pre- 
ferable to letter codes "...must be thickly surroun- 
ded by qualifications." mainly because of the possi- 
bility of utilizing language habits. Furthermore, 
Owsowitz & Sweetland explicitly state that the fact 
that error rate for alpha codes is generally several 
times greater than for numeric codes, this does not 
mean that they should be avoided; the decision will 
depend upon several other considerations since alpha 
codes can transmit a good deal more information per 
character than numeric codes can. 


The ambiguity of advices and guidelines does not 
decrease but obviously rather increase when reaching 
more subtle problems. Let's just illustrate the case 
of whether the operator making the data-entry at the 
entry device should be the same as the person who 
creates the original document or codes the event- 
observation: 


A reviewer of Root & Sadacca's paper concludes that 
their findings seem to justify the following. 

The direct entry method (same person doing both jobs) 
seems to be tretommended where the best total speod 
and accuracy are tteeded, where there is no reason to 
save the message generator's time by delegating the 
data entry task to another, and where he could be 
taught typing efficiency (e.g. more than 35 w.p.m. 
i.e. words per minute). Examples of this type of si- 
tuation are mainly on the military field but could 
also be, for instance, the air traffic controller's 
task. However, where the message generator is a cos- 
tly specialist (e.g. a hospital doctor), or where he 
is not and cannot be taught to be a fast typist, then 
his time could be saved by having a clerk to do the 
data entry. But in such a case, when ERRORS might 
sometimes be vital (e.g. drug prespriptions in hospi- 
tal), it could well be advisable for the specialist 
to enter certain details directly, especially since 
the experiment showed significantly worse errors when 
transcription was by another person.", 


Thus, the reviewer concludes, it seems clear that 
any decision on the method must depend on a DETAILED 
AND THOROUGH ANALYSIS OF THE DATA ENTRY TASK AND OP 
THE SITUATION REQUIREMENTS. CAUTION IS NEEDED IN THE 
INTERPRETATION OF THIS EXPEKIMENT. More research 
would be desirable to enable better guidance. 
(Shackel, 1969, p.159). 


Smith (1966) on his hand states that although his 
study"showed no clear preference for clerical, group 
or individual production worker reporting, the choice 
might be dictated by the NATURE OF THE PRODUCTION 
CYCLE OR FLOW. All other things appearing to be equal, 
it would be preferable for the person recording events 
to be the one most affected by the ACCURACY and TIME- 
LINESS of the entries. The complexity of messages 
transmitted and the variety of types of transactions 
made by an individual can be limited by assigning 
him uniquely to a device at a single work station 
where his primary duties are related to a production 
task rather than data collection. If personnel are 
required to make only occasional entries in a varie- 
ty of message construction forms, individual diffe- 
rences in performance can play a dominant role in es- 
tablishing the expected accuracy. In these cases whe- 
re procedural mistakes might be caused by low volume 
reporting from a work station or by message complexi- 
ty, a clerk with primary emphasis on recording the 
data could be the best choice. Reporting events by 
groups compromises these issues and reduces the 


required quantity of relatively expensive data col- 
lection terminals, but increases the non-productive 
travel time to an input terminal." 


By means of comments we may now realize some of the 
manyfold implications of the above problem. From 
what is said it looks like if ACCURACY were some 
composite function of MOTIVATION (for high accuracy), 
and VARTABILITY & FREQUENCY - say FAMILIARITY - of 
certain tasks. 


Familiarity may be seen as referring to the perfor- 
mance of the original "object" task as well as its 
original observation and coding, but it may also re- 
fer to the task of entering the coded observation, 
directly or by transcription from e.g. an original 
form, into the system. Smith talks about both tasks 
as"recording" probably because in his production en- 
vironment the entry was directly made by the workers- 
observers into the remote terminals. In his work 
Smith keeps anyway the distinction between the two 
tasks by means of classification of errors in diffe- 
rent types; the distinction, however, cannot be con- 
sidered so clear as in Root & Sadacca's study. 


Concerning the choice itself between direct and in- 
direct entry, one criterion appears to be the maxi- 
mization of familiarity, but at the same time a 
trade-off is envisaged against motivation. 


It is difficult to find support for the suggestion 
that direct entry is recommended when best total 
speed and accuracy are needed. Indirect entry, by 
saving the time of the observer-coder might be prefe- 
rable, not so much because the time of costly specia- 
lists is expensive, but rather because of lower rates 
of certain kinds of errors in the performance, obser- 
vation and coding of the original "object" task. The 
lower rates of such errors might well compensate 

and more than compensate for an increase of the rate 
of other less important transcription errors. 


The above comments are concerned with allocation of 
data-entry and observation-coding tasks. A similar 
discussion could be done, but is left outside the 
scope of this paper, concerning the use of pre jrmehed 
cards and other cowpute1r-prepared tovuarvomnd docu- 
ments. In that case we would have Davidson's sugges- 
tion, as referred by EDP Analyzer (1971b, p.9), to 

be qualified by the empirical findings of Smith 
(1966, p.16,66) and Kramer (1970, p.246). These last 
two authors suggest that the use of pre-punched 
cards, badges for individual or machine identifica- 
tion etc. may have a negative effect on accuracy 
because of increased opportunities of certain procedu- 
ral mistakes which are not offset by the system's 
detection and correction features. 
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What conclusions can we draw from the above comments 
on the statements obtained from the reviewed litera- 
ture ? We do not know how to use the reported fi- 
gures of "hard" research on error rates. We do not 
know how much confidence to place on general advices 
not even in those cases where they are based on 
experimental confirmation of common-sense guesses, 
We do not know what ACCURACY really is: we are rather 
told what it might depend on, in certain circumstan- 
ces. 


In order to formulate the only conclusion which 
appears to be safe, we are tempted to borrow the 
words from some of the reviewed papers and formula- 
te the following: 


"In any specific situation, the dedisions for impro- 
ving accuracy will have to be based on an analysis 
of the task and of the situation requirements, of 
the nature of the task cycle and flow." 


And this is about the same as saying nothing, a 
meager result indeed, considering the scope and 
statistical ambitions of the reviewed Papers and the 
ambitions of our own study! At this point we feei 
that it is also doubtful whether some support can be 
found for Shackel's statement that "more research 
would be desirable to enable better guidance", if 
one thinks of the research being done along the same 
lines as the one we have reviewed. 


THE GENERAL SETTING OF THE 
EMPIRICAL QUANTITATIVE RESULTS 


In order to come out of the impasse, let's go back 

to statement No. 16 in the earlier list of state- 
ments of the section where we asked ourselves: 

"what can be stated on the basis of the results 

(of the review of literature with empirical quantita- 
tive results) ?". 


Statement No. 16 is of our own make, and it was sug- 
gested by some of the literature. It states the fol- 
lowing. 


"Consider source errors, sometimes called content, 
event, omission, procedural, misidentification, mis- 
count, etc., generally more important in percent and 
seriousness of consequences than other entry-operator, 
hardware or communication-links errors," 


A review of the literature indicates that errors 

in EDP hardware and communication links are often 
associated with figures of about 1:100,000 or less. 
Similarly, entry-operator e.g. punch-machine operator 
errors are about at 1:100 in order of magnitude. 
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(Let us for the moment forget the problem of inter- 
preting the units of such figures. The reader is 
referred for this purpose to the earlier discussion 
in this chapter.) 


However, as soon as the literature touches on the 
subject of what we in statement 16 called "source" 
errors, error rates seem to soar up to 1:10 or 
1:5 without difficulties. 


Figure 2.1 is an attempt to visualize the general 
experimental setting which in the reviewed literatu- 
re conducted to the mentioned rates above. 


Figute 2.1 shows a source with ah ongoing series of 
activities which are observed and coded ina coding 
process (2). Such codes may genetally be registered 
on an original document like a form which is subse- 
quently used in a data entry process (3). The data 
entry may, as for instance in the case of keypunching 
of cards, be followed by a correction process (4) 
that in the example would be a keypunch verification 
(and correction). The verified inputs so obtained 
are then, possibly after being transmitted through 

a communication channel, be submitted to so-called 
editing, validation or diagnostic preparatory progr- 
ams of the computer, (6) prior to their input and 
use in the normal information processing programs. 


The source, which also could be visualized as a set 
of processes, is also designated by a number, (1) 
in spite of standing for more than proper informa- 
tion processes as the following ones. 


Figure 2.1 suggests that the source might begin by 
contributing to the total error rate with what we 
call in this context "source errors". The coding 
process results in the information set that we label 
ORIGINAL INPUTS or SOURCE DOCUMENTS, possibly supp] c- 
mented with check digits or control totals. This is 
the first existent information set in the sense of 
the reviewed literature (information related to the 
EDP system) and it will, besides the previously men- 
tioned undefined source errors, include CODING ERRORS 
added in the course of the coding process. 


The data-entry process transcribes the original in- 
puts or source documents to e.g. card, tape or disc, 
i.e. to INPUTS IN MACHINE READABLE FORM, and it will 
contribute with TRANSCRIPTION ERRORS. This data-entry 
process may use devices with built-in programmed 
verification and validation (in the sense explained 
by e.g- EDP Analyzer, 197la) and correction. 
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The general setting for experiments 
and measures leading to the reviewed 
empirical quantitative results. 
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(1) 


(2) 


(4) 


(6) 


(7) 
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The prefix "self" stands for the feature being in- 
corporated to the entry device, rather than being 
performed by the human operator, 


To the extent that verification and correction are 
not performed or sufficient at the data-entry process 
stage, they will be performed separately at the 
following correction stage. Correction is seen to 
include the detection sub-process (e.g. sight or 

key as in keypunch verification) and the correction 
itself, leading to what we labeled VERIFIED INPUTS. 
Verification, validation and correction in the data- 
entry process (3) and in the correction process (4) 
will delete some of the etrors previously introduced 
in the chain, but - at least theoretically - may 
introduce own errors which we label CORRECTION PRRORS 
(e.g. correcting an input which is actually right, 

to become wrong - Klemmer, 1959, is one author who 
considers this problem). 


The verified inputs may be submitted to a transmission 
process by a communication system resulting in what 

we label COMMUNICATED INPUTS which include undetec- 
ted TRANSMISSION ERRORS (we delete here the detailed 
breakdown of the communication problem - that is 
considered e.g. by Norman, 1971). Such communicated 
inputs are finally used in the computer input process 
(6) leading to the final INPUTS which include what is 
usually labeled as RESIDUAL ERRORS. 


What does this visualized experimental setting tell 
us ? In the first place it calls our attention on the 
possibility of placing emphasis on different stages 
of the overall process. Before going any further 

let us associate figure 2.1 with another similar 
figure that is found in the scientific literature. 


THE COMMUNICATION-APPROACH 
TO THE ACCURACY PROBLEM 


In discussing the case of a "discrete channel with 
noise" in the context of his mathematical theory of 
communication, C.E. Shannon (1949) considers the 
problem of a signal that is perturbed by a chance 


variable - called NOISE - during transmission or at 
one or the other of the terminals. He considers the 
case in which the received signal is not the same as 


that sent out by the transmitter, and when it does 
not always undergo the same change in transmission 
(distortion), i.e. most generally he considers the 
case when the RECEIVED SIGNAL IS NOT A DEFINITE FUNC- 
TION OF THE TRANSMITTED SIGNAL. 


In order to develop a theorem that gives a direct 
intuitive interpretation of the average uncertainty 
of the correctness of the received signal, Shannon 


considers a communication system and an observer 

(or auxiliary device). THE OBSERVER CAN SEE BOTH 
WHAT IS SENT AND WHAT IS RECOVERED (WITH ERRORS DUE 
TO NOISE). THIS OBSERVER NOTES THE ERRORS IN THE 
RECOVERED MESSAGE AND TRANSMITS DATA TO THE RECEIVING 
POINT OVER A "CORRECTION CHANNEL" TO ENABLE THE 
RECEIVER TO CORRECT THE ERRORS. The situation is 
indicated schematically by Shannon in the figure 2.2 
below which was slightly changed by us for the pur- 
pose of clarity in the following discussion. 
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Schematic diagram of a general communication 
and correction system 
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It is now apparent that the communication approach 
to the accuracy problem, as illustrated in figure 
2.2, can only be applied to the process steps (3), 
(4), (5), (6) of the earlier figure 2,1 where we 
visualized the general setting of the empirical 
quantitative results. To the extent that one is able 
to consider the output of an EDP program as a 
FUNCTION of the input information, there is a possi- 
bility to apply the communication approach also 

to step (7). In any case this appears to be the 
implicit basis for present thinking in AUDITING OF 
EDP SYSTEMS, 


The important thing to note in the context of apply- 
ing the communication approach is that in all cases 
one assumes the existence of an "objective" OBSERVER 
WHO "KNOWS" THE TRUTH OR CORRECTNESS OF TWO OUT OF 
THE THREE ELEMENTS -INPUT, -FUNCTION, -OUTPUT and 

is therefore in position of"authority"for "CORRECTING" 
THE THIRD ONE. For example, if one knows that the 
customer address printed by the computer-printer on 
the invoice is not true (ive. wrong), and also knows 
that the program updating the customer file is true 
(Gree right), then one can deduce that the input to 
the program was not true (i.e. wrong). The "one who 
knows the truth" is what we labeled as the "objective" 
observer. In specific situations, the objective obser 
ver appears sometimes disguised under other labels 
such as system analyst, manager, decision maker, in- 
vestigator, researcher, verifier. 


Irom the above it is apparent that it will be trou- 
blesome to apply the communication approach to those 
steps of figure 2.1 which include truths of doubt- 
ful observability, such as steps (1) - that is 
dealing with events outside the frame of the review- 
ed literature-,(2), and (7). 


Let us consider the process (2) - coding -. 

What is a RIGHT or ACCURATE input to the coding pro- 
cess, since such input is appearing prior to our 
formalization in terms of information ? Whenever the 
reviewed literature has touched on related problems, 
e.g. Owsowitz & Sweetland (1965) and Van Gigch (1970a 
and 1970b), it has assumed the existence of a certain 
set of right inputs; this is a particularly impor- 
tant assumption as remarked by Weaver (Shannon & Wea- 
ver, 1949), since it emphasizes that the general 
setting of the analytical communication studies deals 
with only the first level, A, out of three possible 
levels of communication problems: 


A. How ACCURATELY can the symbols of communication 
be transmitted ? ( The technical problem) . 

B. How PRECISELY do the transmitted symbols convey 
the desired meaning ? (The semantic problem). 

C. How EFFECTIVELY does the received meaning affect 
conduct in the desired way ? (The effectiveness 
problem). 
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Shannon & Weaver's mathematical theory of communica- 
tion then deals only with level A. It is therefore 
left unsaid whether the subdivision in the other 

two levels (semantic and effectiveness), as well 

as Weaver's use of the words ACCURACY versus PRECT-— 
SION and EFFECTIVENESS,are in some sense scientifi- 
cally justified. In our opinion, the distinction 
among these words as suggested by Weaver does not 
assist our research on the issue of quality of in- 
formation, 


In any case it is now clear that most reviewed em- 
pirical results, as suggested by appendix A2, adopt 
the communication approach and as such deal with 

all processes of figure 2.1 except (1), (2) and (7). 
An analysis of the quality of information in these 
terms apparently disregards the most important as- 
pects of quality relative to data banks and informa- 
tion systems for administrative control. 


Furthermore; we do not know of any proof showing 
that such most important aspects are intractable in 
terms of other approaches,other than the communica~ 
tior approach. Oh the contrary, the physical scien- 
ces make extensive use of the concepts of accuracy 
and precision in sgituationswhere no "observer" is 
idealized who can compare a supposedly "true" input 
to the output etc. and where the inputs are conside- 
red to be members of a set of possible true inputs. 
The example of quality concepts from the physical 
sciences suggests alternative approaches to the pro- 
blem,. 


THE REVIEWED LITERATURE GIVES PRACTICAL EXAMPLES 
OF IMPORTANT UNSOLVED QUALITY PRORLEMS 


EDP Analyzer (1968) ,referred in appendix Al, in our 
opinion touches on some symptoms of the most impor- 
tant quality problems when listing EVENTS THAT no 
NOT CONFORM TO POLICY among one of the major causes 
of poor data. At the same time it differentiates 
such cause from INCORRECT CODING OF CLASSIFICATION 
FIELDS, suggesting that in terms of our figure 2.1 
both causes may correspond to the source and coding 
processes (1) and (2). 


Smith (1966) classifies mistakes in FORMAT errors, 
CONTENT errors, and EVENT errors. Format errors are 
by him defined as items that can be detected and 
screened from system input (such as wrong message 
length, illegal characters or malfunction of data 
entry equipment). Content errors are items that have 
correct form, but can be detected as logically in- 
consistent (such as shop status contradictions, 
unusual quantities, wrong machine or operator desig- 
nations). Event errors are those items that have cor- 
rect form and are logically processable, but prove 
inconsistent after subsequent entries or upon use, 
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(such as omitted entries, failure to correct detec-— 
ted mistakes). 


Smith furthermore points out that some comparisons 
between error rates in the field and in the labora- 
tory experiment must make allowance for the fact 
that CONTENT MISTAKES WOULD BE FEWER IN THE EXPERI- 
MENT BECAUSE MISIDENTIFICATION AND MISCOUNTS WERE 
NOT INVOLVED, 


Referring to the card-verification procedure used 
to check the accuracy of card punching operations, 
Smith states that such card verification procedure 
can only identify mechanical and copying mistakes, 
HAVING NO FRAME OF REFERENCE to analyze event des- 
cription, misidentification or miscount,; and many 
format inconsistencies, 


What is this "frame _of reference" that Smith is re- 
ferring to ? We think that it has much to do with 

the theoretical understanding of the quality of in- 
formation that we ourselves are looking for. The 
expression "frame of reference" unhappily belongs to 
the class of heavily misused words ("concept" is 
another such misused word) but it appears that Smith 
considers his classification scheme as the frame of 
reference appropriate for the object of his study. 

We cannot accept such frame of reference for our pur- 
poses since Smith does not motivate it with conside- 
rations of scientific method which assure its genera- 
lity and indicate how it will be used. 


For instance, why does Smith consider corrections to 
good entries as CONTENT mistakes, while failure to 
correct detected mistakes is considered as an EVENT 
mistake ? (See p. 5,6,39,40 of Smith, 1966)(It is 
difficult to conclude whether some inconsistency in 
the allocation of mistakes to different classes is 
an unintended print error.) 


More important, however, we see the problem of eva- 
luating Smith's frame of reference or classification 
with, for instance Root & Sadaccas (1967) classifi- 
cation in SPELLING, OMISSION, CONTENT and SEQUENCE. 
By omission,they mean any failure to enter a required 
item of information; by content,they mean wrong in- 
formation such as wrong identification of the nature 
of an event or object (e.g. "tank" instead of 
"truck"); by sequence, they mean information items 
in a message not being in the proper sequence, 

We are then led to believe that Root & Sadacca's 
omission, content and sequence all are included in 
Smith's event type. 


Similar comparisons may be done with Kramer's 
PROCEDURAL and OMISSION ecrrors; Berglund & Larson's 
errors due to the NATURE OF MATERIAL (source uncer- 
tainties) and OMISSIONS; EDP Analyzer (197la) 
classification of data (and implicitly-crrors ?) in 
TEXT, JARGON, and NON SENSE. 


This means to us that without a general theoretical 
understanding of the quality issue one will not be 
able to compare own error rates with Smith's resi- 
dual rate at about 4 % of entries or Root & Sadacca's 
approximate rate of 2 %. And we have now seen that 
the difficulty to compare is caused by much more 

deep reasons than any ambiguity on what is meant by 
digit-character-symbol, or ambiguity on the nature 

of the message in terms of number of digits-including 
or not including pre-punched sections etc. (See the 
discussions in the earlier sections of this chapter). 


Finally we see that even practical; empirical approa- 
ches to the problem of quality of information raise 
unavoidable important theoretical questions. Such 
questions appear in spite of using a communication- 
naive setting, because this setting is being applied 
to some complex aspects of the information systems 
problem, 


SOME GENERAL CONSIDERATIONS ON THE MATERIAL IN 
THIS CHAPTER. SUMMARY. 


In the previous chapter we had met difficulties in 
defining and measuring the quality of information. 
We raised the question whether such difficulties 
could be by-passed, avoided by applying a so-called 
practical, realistic approach to the problem. 


An extensive review of literature containing empiri- 
cal quantitative results disclosed a great number 

of figures on error rates which proved difficult to 
interprete and apply in practical situations. The 
same appeared as a result of analysis of statements 
containing advices on what to do in order to improve 
quality, where the statements were explicitly or im- 
plicitly obtained from the reviewed literature. 


The remarkably higher rate of certain types of errors 
reported in the literature, suggested that they refer 
red to certain steps of a general information-proces- 
sing sequence. This sequence was visualized in terms 
of a figure which encompassed the measurement setting 
of most reported figures on error rates. This set-— 
ting was seen to be the same as the one used to il- 
lustrate the quality-accuracy issue in communication 
systems, 


The communication approach to the quality of infor- 
mation was seen to be too limited for the purposes 
of application to data banks and information systems. 
Attempts to apply this approach to such environment 
raise many more questions than are able to answer, 
but they suggest that the unanswered questions are 
the most important ones justifying our further study 
in that direction. 


CONCLUSIONS FROM THIS CHAPTER 


1. Most available measures of information quality 
in quantitative terms assume a concept of quality 
in terms of communication theory (theory of signal 
transmission). 


2. The utilization of above measures in a particular 
information system, and the development of other 
necessary measures requite a btoader concept of 
quality which can be made operational. 


The above two statements were formulated from the 
material contained in the sections of this chapter, 
specifically: questions that are raised by the li- 
terature, comments on the statements obtained from 
the reviewed literature, the communication approach 
to the accuracy problem, and - the reviewed litera- 
ture gives practical examples of impottant unsolved 
quality problems. 


Before attempting the development of a broader con- 
cept of quality we will dedicate the next chapter to 
illustrate two possible consequences of lacking 

such a broader concept. This illustration is inten- 
ded as an additional support to the conclusion of 
the previous chapter regarding the importance of the 
quality issue, and it will at the same time supply 

a concrete feeling for the implications of the theo- 
retical developments of the broader concept, 





AGGREGATION AND CODING: 
TWO CONTEXTS WHICH ARE LESS OBVIOUS 


In the attempt of illustrating the implications of 

a narrow understanding of the information quality, 
it is easy to think about the waste of research and 
activities which are to be based on false informa- 
tion premises. Alternatively one may think about the 
damage inflicted to business and society resulting 
from the implementation of false conclusions derived 
from false premises. 


Within the more limited scope of this paper we in- 
stead intend to illustrate the way in which the 
narrow understanding of the quality issue hides 
important exposures in the context of two quite fa- 
miliar and supposedly non-controversial activities 
of the data-bank and information system environment. 


AGGREGATION 


Ageregation, in the context of control systems, is 
described by one author as being the description of 
a system by a lower order model, lower in the sense 
that the model variables in a given sense represent 
"averages" of the system variables. This given sense 
may be a mathematical function defining an "index" 
of the original variables. 


Emery (1969) expresses the function of ageuecentiou 
in the context of design and implementation of or- 
ganizational planning and control systems, as being 
one way of obtaining DATA COMPRESSION. The purpose 
of data compression in an organization is said to 

be the reduction of the VOLUME OF AVAILABLE DATA 

in order not to swamp the organization with trivial 
information and in order not to reduce too severely 
their information content. The aggregation of data 
over unwanted CLASSIFICATION DIMENSIONS, IRRELEVANT 
FOR THE PURPOSE AT HAND, attains reduction of volume, 
For instance, sales transactions might be aggregated 
along the dimensions of customer, salesman, industry, 
and geographic region, leaving the data classified 
in terms of the remaining dimensions - item and time 
period. 


What is said above has a strong intuitive appeal, 

it recalls obvious experiences we all have had in 

the context of simple EDP applications, and is clear- 
ly related to much traditional thinking in statistics 
where one talks about SUFFICIENT STATISTICS or con- 
tractions of observations, sufficient for the PURPO- 
SES TO WHICH THE OBSERVATIONS MAY BE PUT, and espe- 
cially providing a SIGNIFICANT SAVING IN THE MECHANT-— 
CAL LASOR OF STORING AND PRESENTING DATA. 


The same view on aggregation may be held in many con- 
texts of applied research and operations analysis. 

A good illustration of such contexts is given for in- 
stance by Ackoff (1962, p.126) who, in the context 

of omitting uncontrolled variables in the building 
and use of models states that the aggregation of se- 
veral variables does not exactly omit any of the va- 
riables, but it does reduce the number that have to 
be considered. Ackoff also gives some examples of 
aggregations from business applications. 


AGGREGATION AND ERRORS 


Up to now everything seems OK; our interest in ag- 
gregation appeared the first time because of what 
is said on ERRORS in the context of aggregation: 
this is what we will cover next with a question in 
our mind - "does aggregation help to attain better 
decuracy ?", 


Emery (1969) in discussing qualitative aspects of the 
value of ACCURACY of information states that in the 
case of detision processes dealing with unaggregated 
data, the VALUE of information may be highly SENSITI- 
VE TO ERRORS. When data ate aggregated for high-level 
decisions, Emery says, THE VALUE OF GREAT ACCURACY 
drops off sharply. The author illustrates this point 
with the case of an error in a bank account balance, 
which may be very expensive indeed, while its possi- 
ble impact on high level decisions using aggregate 
bank-deposits by state, is much weaker. 


While Emery makes his statements in the same context 
as ours, i.e..data-banks and information systems, it 
is interesting to note that his views seem to be ana- 
log to those expressed e.g. by Ackoff (1962, p.126) 
in the much more constrained context of a well struc- 
tured applied research. Ackoff then states that where 
variables are aggregated, the ERROR (in the estimate 
of the outcome) which is introduced is ROUGHLY PRO- 
PORTIONAL TO THE RATIO OF THE WITHIN-AGGREGATION VA-— 
RIANCE TO THE BETWEEN-AGGREGATION VARIANCE. Put in 
another way, he says, it is desirable to make the va- 
riables aggregated as homogeneous as possible and the 
aggregations as heterogeneous as possible. : 


The above makes us believe that an interpretation of 
such view on aggregation and related error-accuracy 
problems, is that the variables refer to the compo- 
nents of a so-called NEARLY DECOMPOSABLE HIERARCHICG 
SYSTEM. Such near-decomposability implies that the 
short-run behavior of each of the component subsys- 
tems is APPROXIMATELY INDEPENDENT of the short-run 
behavior of the other components, and that in the 
long run, the behavior of any one of the components 
depends IN ONLY AN AGGREGATE WAY on the behavior of 
the other components. (Simon, 1969, p.100). 
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The striking implication of this proposal is that 
one knows the implications of aggregation and rela- 
ted errors, if the system-problem is assumed to have 
been already solved in the sense suggested by Simon 
(1969) or Langefors (1968b). One of the serious dif- 
ficulties of such an assumption, however, is the com- 
mon knowledge that information must be used and er- 
tos estimated in business and social contexts where 
obviously the assumption does not hold, since nobody 
claims to have designed the system or defined its 
goals etc. Furthermore, many data-banks will use and 
store information which has been generated and which 
will be used in unknown contexts, certainly not de- 
signed nor understood in Simon's or Langefors' sys- 
tem terms. 


AGGREGATION AND ERRORS IN ECONOMICS 


Applications of economic science in business and na- 
tional planning makes use of an enormous quantity 
and variety of data or statistics which can very well 
be immagined to be stored in data banks. In most in- 
dustrialized western countries such implementations 
of databank are to some extent already being done, 
and it might be ohly a question of time before it 
becomes common-place, 


Applications of economics to business and national 
planning are much closer to our context of data-banks 
and information systems for administrative control, 
than the trivial applications to bank accounts 

or assumed well structured problems of applied re- 
search mentioned in the last section of this chapter. 


It is therefore important what O.Morgenstern has 
to report from an extensive experience in the sub- 
ject matter, in his book "On the Accuracy of Econo- 
mic Observations" (1963). We edited the following. 


A whole economy is entirely inaccessible for com-— 
putation, unless drastic simplifications are in- 
troduced, This leads to the process of aggregation, 
i.e. the formation of larger entities from myriads 
of components, which presents one of the most im- 
portant but also most troublesome problems of eco- 
nomics. Too much aggregation mixes the unmixable 
and gives us models that are easy to handle but 
with low, if any, power of resolution. By aggrega-— 
ting, errors of a new kind are introduced. (p.101) 


It is possible that the influence of one error 
which drives a number in one direction is exactly 
offset by the influence of another errors doing 
the opposite, leading to a "true" figure for our 
observation. But we have not MADE a true observa- 
tion : The notion that errors cancel out is wide- 
spread and when not explicitly stated, it appears 


as the almost inevitable argument of investigators 
when they are pressed to say why their statistics 
should be acceptable. YET ANY STATEMENT THAT 
ERRORS "CANCEL", NEUTRALIZE EACH OTHER'S INFLU-= 
ENCE, HAS TO BE PROVED. Such proofs are difficult 
and whether a "procf" is acceptable of not is not 
easy to decide. (p.53) 


The mere repeated "checking" of the transcription 
of figures from some source and their correct 
transfer to other papers is no substitute for the 
determination of errors of observation and their 
significance for deductions and inferences. 

It is also necessary that WORTHLESS STATISTICS 

BE COMPLETELY AND MERCILESSLY REJECTED ON THE 
GROUND THAT IT IS USUALLY BETTER TO SAY NOTHING 
THAN TO GIVE WRONG INFORMATION WHICH — QUITE APART 


ways able to check the quality of the data pro- 
cessed by earlier investigators. THIS IS ESPECIAL- 
LY IMPORTANT IF DATA ARE TO BE USED IN EXTENSIVE 
AGGREGATIONS. When elaborate calculations are 
needed that are difficult to set up, this mislea- 
ding information may make the use of high-speed 
computing machines meaningless. (p.54) 


How can one evaluate what Morgenstemsays in the con- 
text of economics against for instance Emery's much 
more optimistic view of the matter ? Maybe the an- 
swer lies in the assumptions. Maybe the answer is 
suggested by what Morgenstern says on the success of 
modern physics: 
in physics errors were recognized for a very long 
time; but they were held to be a secondary nuisan- 
ce, to be neglected and to be ignored by the 
THEORY. Or as Brillouin expresses it:" The assump- 
tion was that errors could be made ‘as small as 
might be desired', by careful instrumentation, and 
played no essential role. Modern physics had to 
get rid of these unrealistic schemes, and it was 
indispensable to recognize the fundamental impor- 
tance of errors, together with the unpleasant fact 
that they cannot be made 'as small as desired! and 
must be included in the theory." (p.61) 


This implies that aggregations will not imply any 
difficulties when they are performed in an informa- 
tion system dealing with problems which are well ex- 
plained by available theories, like physics. 

The situation will become much worse in the 
context of social events such as found in business 
and government where no established theory exists. 


Such insight on the problems of errors and accuracy 
in the context of aggregation is impossible within 
the much narrower frame of accuracy suggested by the 
literature reviewed in the earlier chapters. 
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AGGREGATION AND THE ACCURACY OF INVENTORY RECORDS: 
A CASE STUDY 


Appendix A3 presents some details of a case study 
on so-called inventory differences as signs of the 
inaccuracies of inventory records of the stock of 
completed parts in a plant manufacturing electro- 
mechanical machines, 


The results of the case study are hot fully exploi- 
ted in this paper, but some of them can be used to 
illustrate the vagueness of the implications of ag- 
gregation in a situation in which the system pro- 
blem has not been solved, as well to illustrate the 
complexity of the quality-accuracy issue in terms 
of the vague SOURCE and CODING errors mentioned ih 
the last chapter. 


Plant management and auditors consider the accuracy 
of the inventory records to be a very important mat- 
ter. Does this imply that they do not care of the 
differences since they will in some sense "cancel 
out" in aggregations over time and over items ? 
Certainly not, to judge from the existence itself 

of a rotating inventory count and ftom the recurring 
investigations on the nature of the differences foutid 
through these counts. Also, certainly not, to judge 
from the richness of the number of reports and va-— 
riables in the follow-up statistics on inventory dif- 
ferences}; most of them not usable for low-level deci- 
sion making, 


It actually appears that~ higher levels of manage- 
ment are very dependent on detailed knowledge of 
differences. They are not interested in the possi- 
bility that positive differences "cancel out" nega- 
tive differences. They must keep negative differen- 
ces down to some minimum because of e:g. 


1. Danger of running in line-stop leading to delays 
in delivery of products and waste of idle resout- 
ces. 

2. Incurrence in extra costs for placement of addi- 
tional emergency orders. 

3. Requirement to protect stockholders. 

4, Losses leading to charges on the product price. 


Positive differences must be kept down to a minimum 
because of e.g. 


1. Losses from interest on investment on too high 
- stock, 
2. Losses from not being able to take advantage of 
the maximum allowable write-down of stock value. 
3. Protection of the public from an over-evaluation 
of assets 


3.6 


In the appendix A3 it is possible to see that no easy 
conclusions can be drawn on the aggregate effects 

of the errors listed in the summary list of errors 
leading to inventory differences. If anything, it 

is for instance possible to notice from the summa- 

ry table of the first investigation (1964) that 

there are two kinds of causes that only contribute 

to negative differences and never can cancel each 
other. 


The most interesting insight, howev2r, from the case 
study is that even if the differences cancel-out, 
the problem is to know HOW they cancel out, and to 
what extent the way of cancelling is acceptable in 
face of management's above listed seven objectives 
in keeping differences to a minimum: for instance, 
what is the amplitude and frequency of fluctuations 
around the "true" value,that would be considered 

to be acceptable by certain particular stockholders ? 
An evaluation of aggregation and its errors is thus 
seen to require an understanding of the total sys- 
tem, 


Outside of the particular subject of aggregation, 

the case study also shows the nature of many source, 
coding or observation errors, as they were labeled 
in the review of literature on quality. Ignoring 
such kinds of errors appears to be equivalent to ig- 
noring the larger system in which the purely tech- 
nical EDP system is contained. It is not surprising 
that, to the extent that error rates can be measured 
in some way, the larger contribution to such rates 
in a complex social system will originate outside 
the strictly defined technical EDP subsystem. Con- 
centrating the quality effort on the technical aspects 
may thus be a grave suboptimization: it is something 
like avoiding the cause of difference listed under 
number 9 in the appendix A3, wrong punching, when 
the other 29 listed causes are not considered at all. 


The list of causes of differences also shows how ma- 
ny so-called "human factor" errors may be in their 
turn considered as caused by the inflexibility of 
the EDP program itself (for instance see points 13, 
23, 26). Such facts should have far reaching organi- 
zational implications in future complex systems. 


Finally we can very concretely notice the absence of 
the "objective" observer of the "true" inputs. The 
rotating inventory clerk is the verifier-observer of 
the stock clerks' activities; the three reported in- 
vestigations were performed by verifiers of the veri- 
fiers, i.e. by objective observers of both the rota- 
ting inventory and stock clerks; and our own present 
study can be seen as a further step of verification 
or "objectiveness" - we are discussing the meaning 
of the accuracy of those who checked the accuracy of 
the rotating and perpetual inventory system. A dis- 
cussion of the accuracy of the follow-up statistics 
summarized in appendix A3 would be a concrete docu- 
ment of the vagueness of the complex accuracy issue. 
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A superficial examination of the summary of the con- 
tents of follow-up statistics on inventory differen- 
ces, as displayed in appendix A3, discloses the cre- 
ation of a great number of "aggregation variables", 
out of the basic original observations of differen- 
ces. These extensive statistics and tabulations re- 
late only indirectly to the basic problems as illus- 
trated by the list of causes of differences. This 
suggests to us the applicability of Ackoff's state- 
ment originated from a number of «experiences in the 
field of operations research: "The less we understand 
a phenomenon, the mote variables we require to exp- 
lain it. Hence the mahager who does not fully under- 
stand the phenomenon that he controls plays it 
'safe* and wants as much information as he can get." 


This suggests that the vagueness of the complex ac- 
curacy issue leads to the use of aggregations whose 
aim is not data-compression for preventing the need 
to communicate large volumes of trivial information. 
Aggregations may then: rather be used in the attempt 
to remediate lack of owledge on the nature of er- 
rors or lack of control on them, by massive data- 
processing of the information that happens to be 
available on them. Suth perspective is just one al- 
ternative to the image of tomorrow's sales manager 
who, when confronted with an unfavorable trend of 
sales, Bits dowh at ah on-line terminal requesting 
all possible aggregations and statistical tests to 
be performed on past sales transactions, "searching 
for patterns in the data". We obviously question 
the belief that such a procedure will substitute the 
direct understanding of the original object system; 
the available resources might better be applied to 
such an understanding. 


CODING 


There is some evidence that the broad subject of 
coding in the context of information systems and 
data-banks is not completely understood. 


In the EDP literature, coding has at its best been 
considered as a communication tool, and it has been 
evaluated in technical terms: communication-economy 
through a channel, economy of identification in the 
storage and retrieval of information etc. Codes have 
been developed with primary attention given to machi- 
ne processes, in order to facilitate machine opera-— 
tions, such as the "tight" coding on many 80-column 
punched cards. 


In more recent years, as suggested by some of the 
literature reviewed in the previous chapters, some 
people have realized the need to "design human fac- 
tors into" the code structure in order to minimize 


for instance transcription or digit-manipulation 
errors by humans who are then also considered as 
“communication channels", 


In view of coming ambitious projects for implementa- 
tion of complex data-banks and information systems, 
we think that the time has come to enlarge the above 
view on the meaning of coding. 


One possibility is to integrate the communication- 
approach into the body of modern organization theory. 
An organization may create CATEGORIES for classify- 
ing situations and events. Such classification sche- 
mes are the basis for the program-evoking aspects of 
communication: once the event has been classified, 
the appropriate program can be executed. (March & 
Simon, 1958, p.162) 


The aBove can be illustrated as follows. As soon 
one knows in a manufacturing plant that a particular 
item is not a detail part but rather an assembly, 

to be bought from a local vendor to whom the plant 
will have to consign all of the detail parts to be 
assembled, ~- then the particular item is to be coded 
CQ-509 in the perpetual inventory file. This file 
will later be used e.g. in the requirement genera- 
tion program, 


Another illustration may be taken from the EDP appli- 
cation for updating perpetual inventory records of 
the manufacturing plant. If a particular item was 
previously requisitioned from stock in order to be 
quality-inspected, and it is found that it is no 
usable, and it cannot be reworked but it must rather 
be scrapped, then the transaction to the EDP applica- 
tion program must be coded 5119 O08. The transaction 
will then be processed updating stock status and la- 
ter it will also be used e.g. in accounting applica- 
tions. 


A second possibility to enlarge the view on the 
meaning of coding is to regard it as one method of 
expressing measurements: one attempts to assign e.g. 
objects to classes, while in others one tries to es- 
tatiiish a specific relationship or attempts to assign 
numbers, A coding system will then be a language for 
class assignment, whose rules are the means by which 
a decision-maker uses the information expressed in 
the language. A perfectly "adequate" language of 
class assignments must meet all potential informatio- 
nal requirements, that is, must provide an exhaustive 
classification. (Churchman, 1961, p. 106) 


The striking consequence of this enlarged view of 
coding is that it becomes much more than a question 
of economy in communications, storage, and retrieval, 
It becomes an information probiem that reaches beyond 
hardware, software or human-factors considerations. 
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FORCING REALITY TO FIT THE MODEL 


CODING MAY THEN BE REGARDED. AS THE COUPLING, INTER- 
FACE, OR MEASUREMENT PROCESS LINKING THE REAL WORLD 
(OBJECT SYSTEM WHICH IS TO BE CONTROLLED) TO THE 
MODEL REPRESENTED BY THE INFORMATION SYSTEM OR BY 
THE INFORMAL HANDLING OF THE CODED INFORMATION, 

The importance of this insight for our inquiry in 
errors and quality of information derives from the 
possibility to regard CODING ERRORS not only as cau- 
sed by the "human factor" ot by non-understanding of 
the model by the human codet, The coding errors may 
also be seen as caused by NON-ADEQUATENESS OF THE 
MODEL itself, i.e. by MODEL ERRORS in not taking 
into account relevant aspects of the real world, 
including the social system and humans who are sup- 
posed td work with the model. 


As a simple illustration, consider the research re- 
ported by Cardozo & Leopold (1963) and its extension 
by Van Gigch (1970a and 1970b). Their results sug- 
gest the existence of a maximum human communication 
load, above which human commuriication error rates 
are expected to increase steeply. Codes belonging to 
a code~stheme can be interpreted in terms of such 
communication load. If the load is too high many co- 
ding errors will be committed, Which is the conclu- 
sion ? Before we had available the referenced re- 
search, or to the extent that it is not accepted as 
part of "established" psychological theory, we would 
have claimed that the errors were OBSERVATION or 
CODING errors, requiring eig. tetter discipline and 
training of the human subjects. To the extent that 
the research is accepted and perhaps incorporated 
in a theory, we would instead claim that the errors 
were MODEL errors: the system designer has allowed 
the disorganized growth of interdependent EDP pro- 
grams which impose their own coding scheme without 
consideration to the known "facts" on human constra- 
ints. The system designer will have to improve his 
training and discipline. 


The above could have been reached by sheer common 
sense, What the enlarged view of coding enables us 

to do is, hopefully, to integrate and evaluate many 
different concurrent interpretations of coding errors 
in a particular situation,in terms of scientific 
method. 


As a more complex illustration of the implications 

of the enlarged view on coding, reconsider the case 
of coding of items in the perpetual inventory file of 
the manufacturing plant. What would happen in a case 
when some but not all detail parts are to be consi- 
gned to the vendor from whom the plant buys the com- 
pleted assembly ? Or in the case when some of the 
detail parts turn out to be also assemblies in their 
turn ? What will the coder do, which will his "infor- 
mation load" be if many such sometimes unique excep- 
tions appear every day, and which will the consequen- 
ces of his coding decision be in terms of the system 
designer's or programmer's understanding of the co- 
der's environment,code scheme, and program logic ? 


3.3.2 
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Inaccuracies may in such situations arise from the 
coder's attempts to FORCE REALITY TO FIT THE MODEL. 
In our case study on inventory differences this could 
be illustrated by the stock clerk reporting a stock 
location as being 999 whenever he has to store a 
certain item in a "third" stock location. He knows 
that in this way he will prevent errors of the type 
listed under number 26 of the list on causes of dif- 
ferences: parts not found because the EDP program 
allows a reporting of at most two stock locations 
for the same item, and deletes the record of the 
first upon reporting of the third: The stock clerk 
knows that each time he reads 999 as second stock lo- 
cation of an item, this means that his own manual 
records or to a common stock location where many 
such items are stapled leading perhaps to errors of 
the type listed under number 14: parts are not found 
because too many different parts are stored at the 
same stock location, being easy to overlook them. 


How to evaluate such errors in other environments 
(object systems) which may be much more complex than 
the stockroom of a manufacturing plant, especially 
whenever there are no resources for adapting inflexi- 
ble information systems, coding systems and EDP pro- 
grams, to a changing environment ? 


A striking cybernetic interpretation of the deep im- 
plications of what has been said above, for the pos- 
sibilities to control organizations, is given by 
Beer (1966). It is reviewed here below in terms of 
edited abstracts. (p.310) 


A CYBERNETIC INTERPRETATION, AND OTHER INTERPRETATIONS 


On the shop floor, one can always find an example 
of a machine-loading arrangement which "controls 
the flow and allocation of material around the 
shop". What it actually does is to make desperate 
attempts to keep the job cards posted as they are 
returned - to provide something like an accurate 
reflection of what is going on. To the objective 
cybernetician, then, the shop floor is a control 
system generating variety for the purpose of con- 
trolling the planning office, and not vice versa. 
The reasons for this unhappy example have been 
formally uncovered. They are: lack of requisite 
variety, disobedience of the theorems of communi- 
cation about channel capacity and so on, AND ABOVE 
ALL, A STATIC, INADEQUATE, UNADAPTIVE MODEL OF 
WHAT THE WORLD SITUATION USED TO BE LIKE SEVERAL 
YEARS AGO. ae 


Fortunately, however, control procedures have a 
way of keeping themselves viable and of rectifying 
their mistakes: by means of "ad hoc" comparisons 


Decide 


of real events with their predictors, the control 
subsystem struggles in a horribly inefficient way 
to acquire a certain adaptability. GIVEN THAT THE 
PROPORTION OF NEW EVENTS IS QUITE LOW (new events 
is namely what this kind of control is very bad 
at handling), and given the capability to organi- 
ze the feedback information, everything usually 
can run fairly smoothly. 


The trouble, however, is that in the course of 
time, BECAUSE OF THE VITAL NECESSITY FOR CREATING 
CONTINUOUS AND DETAILED FEEDBACK, the control 
organization must be allowed to grow and become 
prohibitively expensive in terms of personnel, 
facilities, and equipment including large-scale 
electronic data processing equipment. 


But nobody notices that this is a fault in the 
state of affairs, because it is too familiar, 

and because the energies of all concerned are to- 
tally absorbed in arguing the merits of alterna- 
tive computers. Typically, the absurdities inhe- 
rent in the situation are obscured by the APPEA- 
RANCE of modernity and technical competence which 
all this activity betokens. 


Beer's cybernetic interpretation is paralleled by 
Blumenthal's system-positivistic interpretation and 
description of troubles at higher organizational le- 
vels. (Blumenthal, 1969, p.197). Disorganized incre- 
mental growth of so-called systems where 
system is piled on system, or a continuing series 
of minor enhancements is made to existing systems, 
in an attempt to generate relevant management in- 
formation as a hopefully serendipitous by-product 
accompanying the production of increasingly vaster 
quantities of irrelevant, unused, or merely his- 
toric data... 


As the information pile-up occurs another system 
modification or addition is created to produce 
ostensibly only that which is wanted in the situ- 
ation... 


This process is a form of change, true; but it is 
only marginally and fitfully adaptable change. 
Ultimately the patchwork collapses. Systems beco- 
me moribund, and, like dead horses, no longer res- 
pond to the whip. 


Blumenthal goes on giving, in a positivistic mood, 
the answer to this problem: a new dynamics of adap- 
table systoms growth. This appears to us, 4 new 
aspect and name for tho ovex pervading problem of 
model building - in this case information systems. 
What bothers us, however, is the implication of all 
what was said above for our issue of quality-accura- 
cy of information, as well for the related "errors" 
made at distinct organizational levels. 


Pehe 


It is easy to immagine that the terrible descrip- 
tions of serious problems by Beer and Rlumenthal 

must mean also serious things happening to the er- 
rors and accuracy at different levels of the infor- 
mation system. We tried to illustrate this by means 
of the case study on inventory differences but the 
illustration is obviously incomplete in many respects. 


The reviewed literature does not offer any example 
of the problem. We could guess about an example by 
reading"between the lines" of Orlicky's discussion 
of so-called integrity of an average manufacturing 
routing file, and its maintenance. (Orlicky,1969,p.153) 


Such file will consist of several tens of thou- 
sands of records encompassing active, inactive, 
semiobsolete, and obsolete parts. Each of these 
records carries the prescribed sequence of opera- 
tions, their descriptions; the routing to the va- 
rious departments and machine tools within these, 
job standards, and the required tooling, not to 
mention part master data in the record header. 


This file is constantly affected by so many chan- 
ges in manufacturing method, standards, tooling, 
engineering changes, machine tool procurement, 
downgrading and retirement, shop reorganizations, 
etc., that true file maintenance becomes a night- 
mare. This is so BECAUSE MANY TYPES OF CHANGE LI- 
TERALLY EXPLODE THROUGHOUT THE FILE (such as in 
case of adoption of a new class of cutting tools, 
changes in departmental boundaries, or the acqui- 
sition of new productive equipment). A single 
such change MAY CALL FOR HUNDREDS, OR EVEN THOU- 
SANDS, OF PARTS TO BE REROUTED, operations to be 
added or deleted, and methods, standards, and too- 
ling revised accordingly. 


It is possible to guess what such requirements mean 
in terms of impact on the ACCURACY OF CODING. Orli- 
cky goes on stating that the key to this problem is 
the staffing and budget provided for file maintenan- 
ce. Our broad concept of the nature of the coding 
process allows us to frame this statement in concert 
with e.g. Beer's and Blumenthal's: when things begin 
to look like as in Orlicky's description,a better 
contribution to overall accuracy might come from an 
improved system design (with built-in human factors 
considerations) rather than from increased staffing 
and budget for file maintenance. 


We hold that the quality of information, particnlar-— 
ly as expressed in the nature and rates of coding er- 
rors, may be an important indicator of the adequacy 
of system design or of the model. Up to now it has 
been regarded as an indicator mainly of the coding 
and observation process itself. 


3.4 
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GENERAL COMMENTS ON THE CONTENTS OF THIS CHAPTER 


After showing in the previous chapter that the 
empirical approach to quality of information assu- 
mes a narrow concept of quality, and that it does 
not dispense a sound theoretical understanding of 
the issue, we attempted in this chapter to show how 
a too narrow understanding of the issue misses 
important problems arising e.g. during use of data 
banks and information systems. 


Two such less obvious problems are the considetra- 
tions of accuracy in the context of coding and ag- 
gregation. The optimistic view on the aggregation 
of data assumes that the system-problem is already 
solved, and this was seen to be not motivated as 
suggested from problems in economic science and 
in a case study on inventory differences in a manu- 
facturing plant. The optimistic view on coding re- 
gards it as a communication tool for efficient ma- 
chine processing and misses the possibility of re- 
garding it as a measurement process where "errors" 
may indicate model inadequacies. 


We may eventually note that the issues of coding and 
ageregation appear to be closely related. When one 
aggregates for example sales transactions along the 
dimensions of customer, salesman, industry and geo- 
graphic region, this corresponds to the creation of 
a new set of sales data where the above dimensions 
do not make difference and are therefore coded as 
belonging to one same class. The assumption is that 
the new code defines a class of information that 
will be useful for some particular decision. 


CONCLUSIONS FROM THIS CHAPTER 


1. Without an understanding of the information-qua- 
lity issue, aggregation of data may be uncritical- 
ly accepted as being error-free in the context of 
high-level decision making. 


2. Without an understanding of the information-qua- 
lity issue, it is possible to miss the evaluation 
of coding errors and coding difficulties as 
symptoms of inadequate model building or system 
design. 


As a contribution to improved system design, we 

shall try in the next chapter to develop the concepts 
of ACCURACY and PRECISION as two aspects of the broa- 
der understanding of quality of information, to be 
operationalized and "built into" data-banks or in- 
formation systems for administrative control. 


